首页> 外文OA文献 >Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery
【2h】

Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery

机译:预测功率估计算法(PPEA)-减少基因组生物标志物发现的过度拟合的新算法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructed from such data sets often consist of a large number of genes with no obvious functional relevance to the biological effect the model intends to predict that can make it challenging to interpret the modeling results. To address these issues, we developed a novel algorithm, Predictive Power Estimation Algorithm (PPEA), which estimates the predictive power of each individual transcript through an iterative two-way bootstrapping procedure. By repeatedly enforcing that the sample number is larger than the transcript number, in each iteration of modeling and testing, PPEA reduces the potential risk of overfitting. We show with three different cases studies that: (1) PPEA can quickly derive a reliable rank order of predictive power of individual transcripts in a relatively small number of iterations, (2) the top ranked transcripts tend to be functionally related to the phenotype they are intended to predict, (3) using only the most predictive top ranked transcripts greatly facilitates development of multiplex assay such as qRT-PCR as a biomarker, and (4) more importantly, we were able to demonstrate that a small number of genes identified from the top-ranked transcripts are highly predictive of phenotype as their expression changes distinguished adverse from nonadverse effects of compounds in completely independent tests. Thus, we believe that the PPEA model effectively addresses the over-fitting problem and can be used to facilitate genomic biomarker discovery for predictive toxicology and drug responses.
机译:毒物基因组学有望帮助预测不良反应,了解药物作用或毒性的机制,并发现意想不到的或次要的药理学。但是,使用高维和高噪声基因组数据对不良影响进行建模很容易过度拟合。从此类数据集构建的模型通常由大量基因组成,这些基因与该模型打算预测的生物学效应没有明显的功能相关性,这可能会使解释建模结果具有挑战性。为了解决这些问题,我们开发了一种新颖的算法,预测能力估计算法(PPEA),该算法通过迭代的双向自举程序来估计每个单独成绩单的预测能力。通过反复强制样本数量大于副本数量,在每次建模和测试迭代中,PPEA都可以降低过度拟合的潜在风险。我们通过三个不同的案例研究表明:(1)PPEA可以在相对较少的迭代中快速得出单个成绩单预测能力的可靠等级顺序,(2)排名最高的成绩单在功能上与它们的表型相关旨在进行预测,(3)仅使用最具预测性的排名最高的转录本,极大地促进了诸如qRT-PCR等多重测定法作为生物标志物的开发,并且(4)更重要的是,我们能够证明已鉴定出少量基因在完全独立的测试中,排名靠前的成绩单的表述高度预测了表型,因为它们的表达变化与化合物的非不利影响相区别。因此,我们认为PPEA模型有效地解决了过度拟合的问题,可用于促进基因组生物标志物的发现,以预测毒理学和药物反应。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号